Access Structures for Advanced Similarity Search in Metric Spaces
نویسندگان
چکیده
Similarity retrieval is an important paradigm for searching in environments where exact match has little meaning. Moreover, in order to enlarge the set of data types for which the similarity search can efficiently be performed, the notion of mathematical metric space provides a useful abstraction for similarity. In this paper we consider the problem of organizing and searching large data-sets from arbitrary metric spaces, and a novel access structure for similarity search in metric data, called D-Index, is discussed. D-Index combines a novel clustering technique and the pivot-based distance searching strategy to speed up execution of similarity range and nearest neighbor queries for large files with objects stored in disk memories. Moreover, we propose an extension of this access structure (eD-Index) which is able to deal with the problem of similarity self join. Though this approach is not able to eliminate the intrinsic quadratic complexity of similarity joins, significant performance improvements are confirmed by experiments.
منابع مشابه
A Content-Addressable Network for Similarity Search in Metric Spaces
Because of the ongoing digital data explosion, more advanced search paradigms than the traditional exact match are needed for contentbased retrieval in huge and ever growing collections of data produced in application areas such as multimedia, molecular biology, marketing, computer-aided design and purchasing assistance. As the variety of data types is fast going towards creating a database uti...
متن کاملA Hashed Schema for Similarity Search in Metric Spaces (invited talk)
A novel access structure for similarity search in metric data, called Similarity Hashing (SH), is proposed. Its multi-level hash structure of separable buckets on each level supports easy insertion and bounded search costs, because at most one bucket needs to be accessed at each level for range queries up to a pre-de ned value of search radius. At the same time, the number of distance computati...
متن کاملNew Approaches to Similarity Searching in Metric Spaces
Title of dissertation: NEW APPROACHES TO SIMILARITY SEARCHING IN METRIC SPACES Cengiz Celik, Doctor of Philosophy, 2006 Dissertation directed by: Professor David Mount Department of Computer Science The complex and unstructured nature of many types of data, such as multimedia objects, text documents, protein sequences, requires the use of similarity search techniques for retrieval of informatio...
متن کاملScalable and Distributed Similarity Search in Metric Spaces
In this paper we propose a new access structure, called GHT*, based on generalized hyperplane tree (GHT) and distributed dynamic hashing (DDH) techniques. GHT* is a distributed structure which allows to perform range search in a metric space according to a distance function d. The structure does not require a central directory and it is able to gracefully scale through splits of one bucket at a...
متن کاملSimilarity Search in Metric Spaces
Similarity search refers to any searching problem which retrieves objects from a set that are close to a given query object as re ected by some similarity criterion. It has a vast number of applications in many branches of computer science, from pattern recognition to textual and multimedia information retrieval. In this thesis, we examine algorithms designed for similarity search over arbitrar...
متن کامل